7- Transcription Factor Activity - How to score & interpret them

Author

Marc Elosua Bayes

Published

March 4, 2024

Introduction

scRNA-seq data returns individual molecular reads for each cell representing the expression of each gene in each cell. However, transcript abundances at the individual gene level can be hard to interpret. Another confounding factor with these readouts is the high sparsity of the data. This sparsity acutely affects genes with low mRNA abundance (Figure 1) (mereu2020?) . Transcription factors (TFs) are key players involved in regulating the present and future cell states by binding to regulatory regions in the DNA and driving gene expression programs (baskar2022?). Therefore, they are tightly regulated and are often found at low abundances due to their powerful effects on the cells. Hence, being able to quantify the activity of TFs in a cell can provide very valuable information when characterizing the biological processes underlying a cell type or state. However, due to their low expression they severely suffer from dropout events and their mRNA abundance can’t be accurately quantified by looking at the number of UMIs. To address this issue methods have been developed to quantify their activities by leveraging the expression of the genes they regulate.


Figure 1. Dropout Probability vs Expression level

In this vignette we are going to go over the best practices on how to compute these activites, what we need to take into account when computing them and follow a recent benchmarking paper to determine which are the best reference databases (müller-dott2023?) and tools to use like decoupler Badia-i-Mompel et al. (2022)!

Before we start here are some key concepts that will help us and frame the vignette!

  • What is a transcription factor?

    Transcription factors are broadly understood as proteins that bind to regulatory regions of the DNA acting as key regulators of gene-expression programs (baskar2022?).

  • What information do we need to compute the activity of a TF?

    The activity of TF is scored based on the expression of the genes it regulates. Therefore, we need a database that contains which genes are regulated by each transcription factor and the relation between them. Some TF can activate the expression of some genes and repress the expression of others. There are many databases that contain this information and in this vignette we aim to provide the current state of the art databases to use, this can be considered as a gene regulatory network (GRN).

  • Do transcription factors act the same in all cell types?

    No! This is crucial to keep in mind when interpreting TF activities. If we take as an example Blimp1 (PRDM1) a well characterized TF in B and T cells it has been shown to have very different functions. In B cells, Blimp1 drives plasmablast formation and antibody secretion, whereas in T cells, Blimp1 regulates functional differentiation, including cytokine gene expression. Studies have determined both conserved and unique functions of Blimp1 in different immune cell subsets such as the unique direct activation of the igh gene transcription in B cells and a conserved antagonism with BCL6 in B cells, T cells, and myeloid cells (nadeau2022?). This is important to consider because ideally we would have gene-specific GRNs. These can be obtained from multiome datasets but most of the time we don’t have this information, hence reference GRNs are available for these cases and this is what is going to be covered in this vignette.

  • How do we score them in our dataset?

    There are many ways to score the activation of TFs as shown in the decoupler paper Badia-i-Mompel et al. (2022). However, they do not all perform the same and it is important to select a robust method. The suggested method after their benchmarking analysis is running a Univariate Linear Model (ULM) where the gene expression values are the response variable and the regulator weights in the gene signature are the explanatory one (don’t worry, we’ll go through this in more detail in a second). The obtained t-value from the fitted model is the activity score of that gene signature in that cell.

  • How do we interpret the activity obtained?

    Scoring gene signatures using Univariate Linear Models and using the resulting t-value as the scoring metric allows us to simultaneously interpret in a single statistic the direction of activity (either + or -) and its significance (the magnitude of the score).

  • Can we further interrogate the activity scores obtained?

    Yes! In fact it is very important to look past the score obtained by a cell and into which are the genes driving that activity. Sometimes with TF regulating many genes downstream it could be that just a few genes are contributing to its activity in our dataset. Therefore, if we just stopped at the activity score we could be mislead into thinking that all of the genes downstream of the TF are important when it , usually, is actually only a fraction of them. Moreover, heterogeneous gene expression between two populations can also lead to 2 cells or populations having similar scores for one TF but vastly different genes gene programs underlying them.

Libraries

Load the libraries and install the packages needed to run this notebook

if (!requireNamespace("Seurat", quietly = TRUE))
    install.packages("Seurat")

if (!requireNamespace("tidyverse", quietly = TRUE))
    install.packages("tidyverse")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")
    
if (!requireNamespace("decoupleR", quietly = TRUE))
    remotes::install_github('saezlab/decoupleR')

if (!requireNamespace("ComplexHeatmap", quietly = TRUE))
    BiocManager::install("ComplexHeatmap")

if (!requireNamespace("glue", quietly = TRUE))
    install.packages("glue")

if (!requireNamespace("colorBlindness", quietly = TRUE))
    install.packages("colorBlindness")

if (!requireNamespace("ggpmisc", quietly = TRUE))
    install.packages("ggpmisc")

if (!requireNamespace("circlize", quietly = TRUE))
    install.packages("circlize")

library(Seurat)
library(tidyverse)
library(decoupleR)
library(ComplexHeatmap)
library(colorBlindness)
library(ggpmisc)

# Remember to set a seed so the analysis is reproducible!
set.seed(687)

Load data

We’re going to be working with a dataset from the paper - Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19 Download data from cellxgene portal.

# Download the data in data/ directory
# download.file(
#     url = "https://datasets.cellxgene.cziscience.com/d8e35450-de43-451a-9979-276eac688bce.rds",
#     destfile = "../data/workshop-data.rds",
#     method = "wget",
#     extra = "-r -p --random-wait")
# We can also use the CLI with the wget command below
# wget https://datasets.cellxgene.cziscience.com/d8e35450-de43-451a-9979-276eac688bce.rds

se <- readRDS("../data/workshop-data.rds")
# Remove Uncategorized
se <- se[, ! se$Celltype %in% c("Uncategorized1", "Uncategorized2")]

# Subset for the sake of speed
se <- se[, sample(colnames(se), 0.5 * ncol(se))]

Generate a color palette for our cell types

# https://www.datanovia.com/en/blog/easy-way-to-expand-color-palettes-in-r/
nb.cols <- length(unique(se$Celltype))
pal <- colorRampPalette(paletteMartin)(nb.cols)
names(pal) <- sort(unique(se$Celltype))

Analysis

Convert ENSEMBL IDs to Gene Symbols

Right away we can see how ensembl ids are used in the rownames. Let’s transform them into their matched symbols to make them human-readable:

head(rownames(se))
[1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457" "ENSG00000000460" "ENSG00000000938"

Convert to gene symbols

gene_df <- readr::read_csv(file = "../data/cov_flu_gene_names_table.csv")

symbol_id <- data.frame(index = rownames(se)) %>%
    left_join(gene_df, by = "index") %>%
    pull(feature_name)

# re-create seurat object
mtx <- se@assays$RNA$data
rownames(mtx) <- symbol_id
se <- CreateSeuratObject(counts = mtx, meta.data = se@meta.data)

Preprocessing

We will do a quick preprocessing of the data. 1) log-normalize, 2) identify highly variable genes, 3) scale their expression and 4) compute PCA on the scaled data.

se <- se %>%
    NormalizeData(verbose = FALSE) %>%
    FindVariableFeatures(nfeatures = 3000, verbose = FALSE) %>%
    ScaleData(verbose = FALSE) %>%
    RunPCA(verbose = FALSE)

Next we check the elbow plot to determine the number of PCs to use for the downstream analysis and then compute UMAP, K-nearest neighbor graph (KNN graph) and run Louvain clustering on it.

# Look at elbow plot to assess the number of PCs to use
ElbowPlot(se)

We can see a clear elbow at 10 PCs, we’re going to extend it a bit more and use 15 PCs for the downstream analysis to make sure we are not loosing any biological signal

se <- RunUMAP(se, reduction = "pca", dims = 1:30, verbose = FALSE)

Next we compute the K-nearest-neighbor graph

# se <- se %>%
#     FindNeighbors(verbose = FALSE) %>%
#     FindClusters(resolution = c(0.05, 0.1, 0.125, 0.15, 0.2), verbose = FALSE)
# 
# # Visualize the clustering on the UMAP
# DimPlot(
#     se,
#     group.by = c(
#         "RNA_snn_res.0.05", "RNA_snn_res.0.1", "RNA_snn_res.0.125",
#         "RNA_snn_res.0.15", "RNA_snn_res.0.2"))

For the purpose of this tutorial we are going to proceed with resolution 0.15

(dim_plt <- DimPlot(
    se,
    group.by = "Celltype") +
    scale_color_manual(values = pal))

TF activity scoring

As mentioned in the introduction we need a GRN specifying the relation between a TF and its downstream genes. A recent benchmark has shown how CollecTRI is the best database to estimate TF activities (müller-dott2023?). CollecTRI contains regulons of signed TF-target genes that have been compiled from 12 different sources, provides increased TF coverage, and in a recent benchmark showed a superior performance in indentifying TF perturbations from gene expression data (müller-dott2023?). You can see the full explanation in the CollecTRI gihtub page from the Saez lab. We will also follow the decoupler vignette on how to download and score these signatures.

net <- get_collectri(organism = "human", split_complexes = FALSE)
net
# A tibble: 43,178 × 3
   source target   mor
   <chr>  <chr>  <dbl>
 1 MYC    TERT       1
 2 SPI1   BGLAP      1
 3 SMAD3  JUN        1
 4 SMAD4  JUN        1
 5 STAT5A IL2        1
 6 STAT5B IL2        1
 7 RELA   FAS        1
 8 WT1    NR0B1      1
 9 NR0B2  CASP1      1
10 SP1    ALDOA      1
# ℹ 43,168 more rows

Note we are using organism = "human" since we are working with human data - mouse and rat are also available and split_complexes=False since we want to ensure that TF that regulate complexes downstream are scored when the multiple subunits in the complex are present.

In the net dataframe we have the regulation relation between each TF (source) and their downstream genes

# Look at the targets of MYC
net %>% dplyr::filter(source == "MYC")
# A tibble: 888 × 3
   source target   mor
   <chr>  <chr>  <dbl>
 1 MYC    TERT       1
 2 MYC    EPO        1
 3 MYC    ENO1       1
 4 MYC    CDC25A     1
 5 MYC    EMP1       1
 6 MYC    CXCR4      1
 7 MYC    CDKN1A    -1
 8 MYC    TP53       1
 9 MYC    ACTB       1
10 MYC    BRCA1      1
# ℹ 878 more rows
# Look at the weights of MYC
Hmisc::describe(net$mor)
net$mor 
       n  missing distinct     Info     Mean      Gmd 
   43178        0        2     0.35   0.7306   0.4662 
                      
Value         -1     1
Frequency   5816 37362
Proportion 0.135 0.865

From here we can gather that in this database MYC is regulating 886 genes-gene complexes. The weight indicates if MYC activates or represses the activity of a genes. Target genes with values > 0 indicate that MYC promotes the expression and viceversa in those with a weight < 1.

It is important to note that here we are using CollecTRI as a reference database. Therefore, some TF may be missing or the mode of regulation may be different from the expected in a cell type of interest. If you have inhouse curated GRNs obtained from inference methods such as CellOracle, pySCENIC or SCENIC+ you can plug them in here. These have the advantage that you can compute GRNs for your cell types and in turn use them in your scRNAseq dataset!

Activity inference with univariate linear model (ULM)

“Univariate Linear Model (ULM) fits a linear model for each sample and regulator, where the observed molecular readouts in mat are the response variable and the regulator weights in net are the explanatory one. Target features with no associated weight are set to zero. The obtained t-value from the fitted model is the activity ulm of a given regulator.”

Figure 2. Univariate linear model method graphical representation.

Moreover, a nice thing about ulm is that in a single statistic it provides the direction of activity (either + or -) and its significance (the magnitude of the score). Making the scores very easy to interpret!

So lets compute the activity scores for every cell in our dataset!

ulm_start <- Sys.time()
res <- decoupleR::run_ulm(
    mat = se@assays$RNA$data,
    network = net,
    .source = "source",
    .target = "target",
    .mor = "mor")

glue::glue("Time to run ulm is {round(difftime(Sys.time(), ulm_start, units = 'min'), 0)} minutes")
Time to run ulm is 11 minutes

Save data

We can see how every cells has a score for every signature!

 # Looking at the first 10 entries
res
# A tibble: 22,435,329 × 5
   statistic source condition           score   p_value
   <chr>     <chr>  <chr>               <dbl>     <dbl>
 1 ulm       MYC    AAACCCAAGAATGTTG-12  23.4 7.69e-120
 2 ulm       MYC    AAACCCAAGCATTGTC-19  25.9 6.01e-147
 3 ulm       MYC    AAACCCAAGTAAGGGA-18  27.8 2.40e-168
 4 ulm       MYC    AAACCCAAGTGTAGAT-5   24.1 6.39e-127
 5 ulm       MYC    AAACCCACACAATGCT-5   26.0 7.46e-148
 6 ulm       MYC    AAACCCACACCCTGTT-15  23.1 5.97e-117
 7 ulm       MYC    AAACCCACACCTAAAC-12  20.0 1.96e- 88
 8 ulm       MYC    AAACCCACACTGGAAG-16  29.6 2.58e-190
 9 ulm       MYC    AAACCCACAGCTACCG-2   25.8 9.26e-146
10 ulm       MYC    AAACCCACAGCTGAAG-1   24.8 3.78e-134
# ℹ 22,435,319 more rows
# Check the cell type for cell AAACCCAAGGGCAATC-1
ct_b <- "AAACCCACAGCTGAAG-1"
ct_nk <- "AAACCCAGTCATGACT-2"
se@meta.data[c(ct_b, ct_nk), "Celltype", drop = FALSE]
                       Celltype
AAACCCACAGCTGAAG-1 B cell, IgG-
AAACCCAGTCATGACT-2      NK cell
# Looking at all the scores for one specific cell
res %>%
    filter(condition %in% c(ct_b, ct_nk)) %>%
    arrange(condition, desc(score))
# A tibble: 1,542 × 5
   statistic source condition          score   p_value
   <chr>     <chr>  <chr>              <dbl>     <dbl>
 1 ulm       MYC    AAACCCACAGCTGAAG-1 24.8  3.78e-134
 2 ulm       RFXAP  AAACCCACAGCTGAAG-1 21.7  2.48e-103
 3 ulm       RFXANK AAACCCACAGCTGAAG-1 20.3  2.81e- 91
 4 ulm       CIITA  AAACCCACAGCTGAAG-1 20.2  2.70e- 90
 5 ulm       RFX5   AAACCCACAGCTGAAG-1 18.6  1.29e- 76
 6 ulm       ZBTB4  AAACCCACAGCTGAAG-1 14.2  1.11e- 45
 7 ulm       ZBED1  AAACCCACAGCTGAAG-1 13.0  1.61e- 38
 8 ulm       HIVEP2 AAACCCACAGCTGAAG-1 11.3  1.46e- 29
 9 ulm       RFX1   AAACCCACAGCTGAAG-1 10.2  1.44e- 24
10 ulm       NFKB   AAACCCACAGCTGAAG-1  8.61 7.69e- 18
# ℹ 1,532 more rows

How does a univariate linear model work?

Lets start with a toy example. Imagine a very simple scenario where we have two very simple vectors where one is double the other. We can compute the linear model and also easily visualize the relationship between both vectors:

# Define vectors of interest
vec1 <- c(-1, -0.5, 1, 2, 5)
vec2 <- c(-0.8, -0.7, 1.4, 1.8, 4)

# Run the linear model
summary(lm(vec2 ~ vec1))

Call:
lm(formula = vec2 ~ vec1)

Residuals:
       1        2        3        4        5 
-0.04956 -0.36053  0.50658  0.08465 -0.18114 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.07149    0.19800   0.361  0.74197   
vec1         0.82193    0.07920  10.378  0.00191 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3782 on 3 degrees of freedom
Multiple R-squared:  0.9729,    Adjusted R-squared:  0.9639 
F-statistic: 107.7 on 1 and 3 DF,  p-value: 0.001909
# Visualize the data
(p <- ggplot(mapping = aes(x = vec1, y = vec2)) +
        geom_point() +
        geom_abline(slope = 1, color = "red", linetype = "dashed") + # Passing a slope = 1
        geom_hline(yintercept = 0, size = 0.25) +
        geom_vline(xintercept = 0, size = 0.25) +
        coord_fixed() +
        theme_minimal())

# now we can add the slope of the line that best fits our data and the T value
p +
    stat_poly_line(formula = y ~ x, se = FALSE) +
    stat_poly_eq(use_label(c("eq"))) +
    stat_correlation(use_label(c("t")), label.x = 0.05, label.y = 0.9)

In the example above we see the linear relationship between both vectors and we get the slope and the T value:
- The slope indicates the what is the change in the response variable (vec2) given a 1 unit change in the predictor variable (vec1).

- The T statistic is the result of a T test. The T test assesses the significance of individual coefficients in our model. The T value indicates the number of standard errors the estimated coefficient is away from the null hypothesis (t = 0). Remember the T value is the \(\frac{coefficient}{standard~error}\).

An example of an activated TF activity score would look like this:

# Define vectors of interest
gex <- c(0, 0, 2.5, 0.65, 1.2)
mor <- c(-1, -1, 1, 1, 1)

# Run the linear model
summary(lm(gex ~ mor))

Call:
lm(formula = gex ~ mor)

Residuals:
         1          2          3          4          5 
 1.884e-16 -1.140e-16  1.050e+00 -8.000e-01 -2.500e-01 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.725      0.354   2.048    0.133
mor            0.725      0.354   2.048    0.133

Residual standard error: 0.7757 on 3 degrees of freedom
Multiple R-squared:  0.5829,    Adjusted R-squared:  0.4439 
F-statistic: 4.193 on 1 and 3 DF,  p-value: 0.133
# Visualize the data
(p <- ggplot(mapping = aes(x = gex, y = mor)) +
        geom_jitter(width = 0, height = 0.1) +
        geom_abline(slope = 1, color = "red", linetype = "dashed") + # Passing a slope = 1
        geom_hline(yintercept = 0, size = 0.25) +
        geom_vline(xintercept = 0, size = 0.25) +
        coord_fixed() +
        theme_minimal())

# now we can add the slope of the line that best fits our data and the T value
p +
    stat_poly_line(formula = y ~ x, se = FALSE) +
    stat_poly_eq(use_label(c("eq"))) +
    stat_correlation(use_label(c("t")), label.x = 0.05, label.y = 0.9)

Here we see how genes with an mor of -1 (the TF represses the expression of that gene) are not expressed (gex = 0), while those with and mor = 1 (the TF promotes the expression of that gene) are expressed!

Now lets look at a “real world” example, we want to score the B cell signature in one cell. First we are going to start by visualizing the relationship between the weights and the gene expression for 2 cells of interest, one is a B cell and the other is not.

We need to do a bit of data prep but bear with me!

# We have our gene expression matrix
mat <- se@assays$RNA$data

# We want to obtain a matrix with 1s and 0s indicating the weight each gene has for each signature
## Initialize mor_mat with all 0s
sources <- unique(net$source)
targets <- rownames(mat)
mor_mat <- matrix(0, ncol = length(sources), nrow = nrow(mat))
colnames(mor_mat) <- sources
rownames(mor_mat) <- targets
weights <- net$mor

# Fill in the matrix with the weights in the right places
for (i in seq_len(nrow(net))) {
    .source <- net$source[[i]]
    .target <- net$target[[i]]
    .weight <- weights[[i]]
    if (.target %in% targets) {
        mor_mat[[.target, .source]] <- .weight
    }
}
# labels for geom_text_repel
repel_text <- rownames(mat)
pax5_targets <- net %>% filter(source == "PAX5") %>% pull(target)
pax5_targets <- pax5_targets[pax5_targets %in% rownames(mat)]
keep <- which(rownames(mat) %in% pax5_targets)
# Set non-selected positions to NA
repel_text[-keep] <- NA

# Visualize the data
# Note that we are using geom_bin2d which bins the data to show how many genes are found at each location (x, y)
ggplot(mapping = aes(x = mat[, ct_b], y = mor_mat[, "PAX5", drop = FALSE])) +
    geom_bin2d(bins = 50) +
    ggrepel::geom_text_repel(aes(label = repel_text)) +
    geom_abline(slope = 1, color = "red", linetype = "dashed") + # Passing a slope = 1
    labs(x = "Gene Expression", y = "Gene Weight", title = glue::glue("Cell {ct_b} - B cell")) +
    coord_fixed() +
    scale_fill_gradient(name = "count", trans = "log10") +
    xlim(-0.5, 10) +
    ylim(-2, 2) + # Set the limits from -2 to 2 since the mor values can be negative
    theme_minimal() +
    # now we can add the slope of the line that best fits our data and the T value
    stat_poly_line(formula = x ~ y, se = FALSE) +
    stat_poly_eq(use_label(c("eq"))) +
    stat_correlation(use_label(c("t")), label.x = 0.05, label.y = 0.9)

# Visualize the data
ggplot(mapping = aes(x = mat[, ct_nk], y = mor_mat[, "PAX5", drop = FALSE])) +
    geom_bin2d(bins = 50) +
    ggrepel::geom_text_repel(aes(label = repel_text)) +
    geom_abline(slope = 1, color = "red", linetype = "dashed") + # Passing a slope = 1
    labs(x = "Gene Expression", y = "Gene Weight", title = glue::glue("Cell {ct_nk} - Not a B cell")) +
    coord_fixed() +
    scale_fill_gradient(name = "count", trans = "log10") +
    xlim(-0.5, 10) +
    ylim(-2, 2) +
    theme_minimal() +
    # now we can add the slope of the line that best fits our data and the T value
    stat_poly_line(formula = x ~ y, se = FALSE) +
    stat_poly_eq(use_label(c("eq"))) +
    stat_correlation(use_label(c("t")), label.x = 0.05, label.y = 0.9)

Next we are going to manually run the models for these two cells so that we can see that the results obtained from decoupleR make sense!

Lets run the a linear model to score two cell for the B cell signature

mod1 <- lm(as.matrix(mat[, ct_b, drop = FALSE]) ~ mor_mat[, "PAX5", drop = FALSE])
summary(mod1)

Call:
lm(formula = as.matrix(mat[, ct_b, drop = FALSE]) ~ mor_mat[, 
    "PAX5", drop = FALSE])

Residuals:
    Min      1Q  Median      3Q     Max 
-0.3038 -0.0581 -0.0581 -0.0581  6.6019 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     0.058139   0.001866  31.158  < 2e-16 ***
mor_mat[, "PAX5", drop = FALSE] 0.245627   0.041251   5.954 2.64e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.34 on 33232 degrees of freedom
Multiple R-squared:  0.001066,  Adjusted R-squared:  0.001036 
F-statistic: 35.45 on 1 and 33232 DF,  p-value: 2.636e-09
0.245627 / 0.041251 # This equals the T value
[1] 5.95445
mod2 <- lm(as.matrix(mat[, ct_nk, drop = FALSE]) ~ mor_mat[, "PAX5", drop = FALSE])
summary(mod2)

Call:
lm(formula = as.matrix(mat[, ct_nk, drop = FALSE]) ~ mor_mat[, 
    "PAX5", drop = FALSE])

Residuals:
    Min      1Q  Median      3Q     Max 
-0.0887 -0.0753 -0.0753 -0.0753  6.2342 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     0.075294   0.001962  38.371   <2e-16 ***
mor_mat[, "PAX5", drop = FALSE] 0.013364   0.043381   0.308    0.758    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3575 on 33232 degrees of freedom
Multiple R-squared:  2.856e-06, Adjusted R-squared:  -2.724e-05 
F-statistic: 0.09491 on 1 and 33232 DF,  p-value: 0.758
0.013364 / 0.043381 # This equals the T value
[1] 0.3080611

We can see how mod1 has returned a high coefficient for cell AAACCCACAGCTGAAG-1 while mod2 has returned a low coefficient for cell AAACCCAGTCATGACT-2. Moreover, when we look at the T value for the PAX5 we see how in mod1 it is 5.95 while for mod2 it is 0.3080611.

Lets check that the T values obtained manually actually match those returned by decoupleR

res %>% filter(source == "PAX5" & condition %in% c(ct_b, ct_nk))
# A tibble: 2 × 5
  statistic source condition          score       p_value
  <chr>     <chr>  <chr>              <dbl>         <dbl>
1 ulm       PAX5   AAACCCACAGCTGAAG-1 5.95  0.00000000264
2 ulm       PAX5   AAACCCAGTCATGACT-2 0.308 0.758        

Effectively, they do!

Visualization

We can directly add the ulm scores to an assay in our object and visualize the results

se[['tfulm']] <- res %>%
  pivot_wider(
      id_cols = 'source',
      names_from = 'condition',
      values_from = 'score') %>%
  column_to_rownames('source') %>%
  Seurat::CreateAssayObject(.)

# Change assay
DefaultAssay(object = se) <- "tfulm"

# Scale the data for comparison across signatures 
se <- ScaleData(se)
se@assays$tfulm@data <- se@assays$tfulm@scale.data
UMAP visualization

Plot TF activities on the UMAP embedding

plt <- FeaturePlot(
    se,
    features = c("PAX5", "EBF1", "EOMES"),
    ncol = 2,
    slot = "data") &
    scale_color_viridis_c(option = "magma")

plt + dim_plt

As mentioned in the introduction, TF are usually very lowly expressed and thus the mRNAs suffer from higher droput rates than more highly expressed genes with more mRNA copies. Using TF activities showcases how we can recover their activity by looking at the genes a TF regulates downstream.

# TF activity scores
DefaultAssay(se) <- "tfulm"
plt1 <- FeaturePlot(
    se,
    features = c("PAX5", "EBF1", "EOMES"),
    ncol = 3,
    slot = "data") &
    scale_color_viridis_c(option = "magma")

DefaultAssay(se) <- "RNA"
plt2 <- FeaturePlot(
    se,
    features = c("PAX5", "EBF1", "EOMES"),
    ncol = 3,
    slot = "data") &
    scale_color_viridis_c(option = "magma")

plt1 / plt2

We can see how the difference is night and day. In the top row we see the TF activities while in the bottom the normalized gene expression. In all 3 examples (PAX5, EBF1, EOMES) the transcript is rarely detected while the activity is well recapitulated.

Heatmap by cells

We can also visualize the top 25 most highly variable TF scores using a heatmap

n_tfs <- 25
# Extract activities from object as a long dataframe
df <- t(as.matrix(se@assays$tfulm@counts)) %>%
    as.data.frame() %>%
    mutate(cluster = Idents(se)) %>%
    pivot_longer(cols = -cluster, names_to = "source", values_to = "score")

# Get top tfs with more variable means across **cells**
tfs <- df %>%
    group_by(source) %>%
    summarise(std = sd(score)) %>%
    arrange(-abs(std)) %>%
    head(n_tfs) %>%
    pull(source)

glue::glue("Note that the maximum scaled value is: {round(max(se@assays$tfulm@data[tfs, ]), 2)}, and the minimum is {round(min(se@assays$tfulm@data[tfs, ]), 2)}.")
Note that the maximum scaled value is: 10, and the minimum is -11.42.
DoHeatmap(
    se,
    features = tfs,
    group.by = "Celltype",
    assay = "tfulm",
    disp.max = quantile(se@assays$tfulm@data[tfs, ], .99),
    disp.min = quantile(se@assays$tfulm@data[tfs, ], .01),
    slot = "data") +
    scale_fill_viridis_c(option = "viridis")

Heatmap by groups

From the plot above we can see how we have very distinct populations in our datasets. We can also look at it a bit less granular by looking at the mean activity score per cluster.

n_tfs <- 25
# Extract activities from object as a long dataframe
df <- t(as.matrix(se@assays$tfulm@data)) %>%
    as.data.frame() %>%
    mutate(cluster = se$Celltype) %>%
    pivot_longer(cols = -cluster, names_to = "source", values_to = "score") %>%
    group_by(cluster, source) %>%
    summarise(mean = mean(score))

# Get top tfs with more variable means across **clusters**
tfs <- df %>%
    group_by(source) %>%
    summarise(std = sd(mean)) %>%
    arrange(-abs(std)) %>%
    head(n_tfs) %>%
    pull(source)

# Transform to wide matrix
top_acts_mat <- df %>%
    filter(source %in% tfs) %>%
    pivot_wider(id_cols = 'cluster', names_from = 'source',
              values_from = 'mean') %>%
    column_to_rownames('cluster') %>%
    as.matrix()

# Choose color palette
palette_length <- 100
my_color <- colorRampPalette(c("Darkblue", "white","red"))(palette_length)

# Show which is the max and min of the scaled value to make sure we set a scale that makes sense
glue::glue("Note that the maximum scaled value is: {round(max(top_acts_mat), 2)}, and the minimum is {round(min(top_acts_mat), 2)}.")
Note that the maximum scaled value is: 5.22, and the minimum is -5.1.
my_breaks <- c(seq(quantile(top_acts_mat, .01), 0, length.out=ceiling(palette_length/2) + 1),
               seq(0.05, quantile(top_acts_mat, .99), length.out=floor(palette_length/2)))
# Plot
ComplexHeatmap::pheatmap(top_acts_mat, border_color = NA, color=my_color, breaks = my_breaks)

From this approach we see how RBCs are driving tis visualization. When this happens we can follow another approach in which we show the top activated TF per cluster.

# Get top tfs with more variable means across **clusters**
tfs_df <- df %>%
    group_by(cluster) %>%
    top_n(n = 5, wt = abs(mean))

# We can see how cluster 6 has the highest TF activity scores
# we are going to take this into account when visualizing the data!
tfs_df %>% data.frame()
                 cluster  source       mean
1           B cell, IgG+    CHD4 -2.2350765
2           B cell, IgG+    EBF1  2.1873061
3           B cell, IgG+    ELF2  1.9539721
4           B cell, IgG+    PAX5  1.6526491
5           B cell, IgG+   PHF5A  2.6884229
6           B cell, IgG-    CHD4 -2.4314153
7           B cell, IgG-    EBF1  2.0782731
8           B cell, IgG-    PAX5  2.0093399
9           B cell, IgG-   PHF5A  2.4176113
10          B cell, IgG-   SATB2  1.9023347
11          CD4, EM-like    BRD4  1.2192357
12          CD4, EM-like   CEBPZ -1.0096314
13          CD4, EM-like  NOTCH1  1.0923722
14          CD4, EM-like    SPIC -1.2293355
15          CD4, EM-like   ZBTB4  1.0367033
16      CD4, non-EM-like     JUN -1.2652165
17      CD4, non-EM-like    NFYB -1.2568654
18      CD4, non-EM-like     REL -1.3204073
19      CD4, non-EM-like    SPIC -1.2473115
20      CD4, non-EM-like  TRERF1  1.3382539
21          CD8, EM-like    BRD4  1.0591918
22          CD8, EM-like   FOXN4  1.1197703
23          CD8, EM-like   HDAC3 -1.1546321
24          CD8, EM-like     HLX  1.0529305
25          CD8, EM-like   LMX1A  1.1323275
26      CD8, non-EM-like    CREM  0.8884825
27      CD8, non-EM-like   KLF13  1.0035781
28      CD8, non-EM-like   PRDM1  0.8602616
29      CD8, non-EM-like   SATB1  1.1965138
30      CD8, non-EM-like    SPIB -0.9008243
31                    DC    RFX1  1.8216438
32                    DC    RFX5  2.2848272
33                    DC  RFXANK  2.3448202
34                    DC   RFXAP  2.3605657
35                    DC    VSX2  1.7193660
36               NK cell   EOMES  1.1452481
37               NK cell    IRF7  0.8983151
38               NK cell   PRDM1  0.9614529
39               NK cell   ZGLP1  1.2036081
40               NK cell  ZNF395  1.2785221
41              Platelet   FOXF1  2.5719136
42              Platelet   FOXI1  2.5542770
43              Platelet     MYC -2.4354752
44              Platelet    PBX2  2.7250495
45              Platelet  PKNOX1  2.8120724
46                   RBC    FGF2 -5.1037959
47                   RBC   HOXB6 -4.8278412
48                   RBC    KLF1  5.0199321
49                   RBC   MIXL1  5.1320526
50                   RBC    NFE2  5.2159351
51    classical Monocyte   CEBPE  1.1357007
52    classical Monocyte   CEBPG  1.1822902
53    classical Monocyte    ELF5  1.2040499
54    classical Monocyte ONECUT1  1.2310754
55    classical Monocyte    SPI1  1.1376659
56 intermediate Monocyte     AP1  1.3116265
57 intermediate Monocyte    ETS1  1.2526442
58 intermediate Monocyte    NFKB  1.2147113
59 intermediate Monocyte    RELA  1.2217876
60 intermediate Monocyte     SP1  1.1835946
61 nonclassical Monocyte  BCL11B  1.5684299
62 nonclassical Monocyte    ETV2  1.5857012
63 nonclassical Monocyte    PAX6  1.5440253
64 nonclassical Monocyte   SOX17 -1.5784436
65 nonclassical Monocyte    ZXDC  1.6864956
# Extract the TF onto a list
tfs <- tfs_df %>%
    pull(source)

# Transform to wide matrix
top_acts_mat <- df %>%
    filter(source %in% tfs) %>%
    pivot_wider(id_cols = 'cluster', names_from = 'source',
              values_from = 'mean') %>%
    column_to_rownames('cluster') %>%
    as.matrix()

# Choose color palette
palette_length <- 100
my_color <- colorRampPalette(c("Darkblue", "white","red"))(palette_length)

# Show which is the max and min of the scaled value to make sure we set a scale that makes sense
glue::glue("Note that the maximum scaled value is: {round(max(top_acts_mat), 2)}, and the minimum is {round(min(top_acts_mat), 2)}.")
Note that the maximum scaled value is: 5.22, and the minimum is -5.1.
my_breaks <- c(seq(quantile(top_acts_mat, .01), 0, length.out=ceiling(palette_length/2) + 1),
               seq(0.05, quantile(top_acts_mat, .99), length.out=floor(palette_length/2)))
# Plot
ComplexHeatmap::pheatmap(top_acts_mat, border_color = NA, color=my_color, breaks = my_breaks)

We can clearly assess that the transcription factors returned are cell type specific since B cells cluster together, and T cells and monocytes as well.

Heatmap by condition & cell type

Lastly we want to check if there are differences in the TF activation depending on cell type and disease

meta_sub <- se@meta.data %>%
    rownames_to_column("bc") %>%
    dplyr::select(bc, Celltype, disease)
df <- t(as.matrix(se@assays$tfulm@data)) %>%
    as.data.frame() %>%
    rownames_to_column("bc") %>%
    left_join(meta_sub, by = "bc") %>%
    column_to_rownames("bc") %>%
    pivot_longer(cols = -c(Celltype, disease), names_to = "source", values_to = "score") %>%
    group_by(Celltype, source, disease) %>%
    summarise(mean = mean(score)) %>%
    mutate(disease_celltype = glue::glue("{disease} - {Celltype}"))

# Get top tfs with more variable means across **clusters**
tfs_df <- df %>%
    group_by(disease_celltype) %>%
    top_n(n = 10, wt = abs(mean))

# We can see how cluster 6 has the highest TF activity scores
# we are going to take this into account when visualizing the data!
# tfs_df %>% data.frame()

# Extract the TF onto a list
tfs <- tfs_df %>%
    pull(source)

# Transform to wide matrix
top_acts_mat <- df %>%
    filter(source %in% tfs) %>%
    pivot_wider(id_cols = 'disease_celltype', names_from = 'source',
              values_from = 'mean') %>%
    column_to_rownames('disease_celltype') %>%
    as.matrix()

# Choose color palette
palette_length <- 100
my_color <- colorRampPalette(c("Darkblue", "white","red"))(palette_length)

# Show which is the max and min of the scaled value to make sure we set a scale that makes sense
glue::glue("Note that the maximum scaled value is: {round(max(top_acts_mat), 2)}, and the minimum is {round(min(top_acts_mat), 2)}.")
Note that the maximum scaled value is: 6.89, and the minimum is -6.55.
my_breaks <- c(seq(quantile(top_acts_mat, .01), 0, length.out=ceiling(palette_length/2) + 1),
               seq(0.05, quantile(top_acts_mat, .99), length.out=floor(palette_length/2)))
# Plot
ComplexHeatmap::pheatmap(top_acts_mat, border_color = NA, color=my_color, breaks = my_breaks)

This heatmap can be overwhelming, we can take a look at specific populations of interest, in this case monocytes and highlight the most highly variable ones:

n_tfs <- 25

# Get top tfs with more variable means across **clusters**
tfs <- df %>%
    filter(str_detect(Celltype, "Monocyte")) %>%
    group_by(source) %>%
    summarise(std = sd(mean)) %>%
    arrange(-abs(std)) %>%
    head(n_tfs) %>%
    pull(source)

# Transform to wide matrix
top_acts_mat <- df %>%
    filter(source %in% tfs) %>%
    filter(str_detect(Celltype, "Monocyte")) %>%
    pivot_wider(id_cols = 'disease_celltype', names_from = 'source',
              values_from = 'mean') %>%
    column_to_rownames('disease_celltype') %>%
    as.matrix()

# Choose color palette
palette_length <- 100
my_color <- colorRampPalette(c("Darkblue", "white","red"))(palette_length)

# Show which is the max and min of the scaled value to make sure we set a scale that makes sense
glue::glue("Note that the maximum scaled value is: {round(max(top_acts_mat), 2)}, and the minimum is {round(min(top_acts_mat), 2)}.")
Note that the maximum scaled value is: 2.04, and the minimum is -1.67.
my_breaks <- c(seq(quantile(top_acts_mat, .01), 0, length.out=ceiling(palette_length/2) + 1),
               seq(0.05, quantile(top_acts_mat, .99), length.out=floor(palette_length/2)))
# Plot
ComplexHeatmap::pheatmap(top_acts_mat, border_color = NA, color=my_color, breaks = my_breaks)

In this last scenario we can see how the cell types cluster by major cell type, not by condition. However, in intermediate monocytes there seems to be an enrichment in interferon response genes in those from influenza or covid patients.

Heatmap for gene expression

To fully grasp which genes are driving each gene signature within each cell we want to visualize the gene expression of the genes involved in each gene signature for each cell. We can do so using the ComplexHeatmap package and a little bit of data processing. For ease here is a function you can incorporate in your analysis:

geneHM <- function(
        object,
        sig_df,
        sig_name,
        sig_assay,
        .source,
        .target,
        sig_slot = "data",
        expr_assay = "RNA",
        expr_slot = "data",
        grouping = NULL,
        grouping_color = NULL,
        expr_cols = viridisLite::magma(100)) {
    
    # Extract Gene Expression Matrix from Seurat Object
    gene_expr <- GetAssayData(object, assay = expr_assay, slot = expr_slot)
    
    # Subset the genes of the signature from the Gene Expression Matrix
    genes_of_interest <- sig_df[, .target][which(sig_df[, .source] %in% sig_name)]
    
    # Subset the genes intersecting between gene expression and genes in signature
    g_int <- intersect(rownames(gene_expr), genes_of_interest)
    
    if (length(g_int) < length(genes_of_interest)) {
        genes_excluded <- genes_of_interest[!genes_of_interest %in% rownames(gene_expr)]
        genes_excluded <- paste(genes_excluded, collapse = ", ")
        message(paste0(
            "Genes ", genes_excluded,
            " are in the gene signature but not in the expression matrix,",
            " therefore, they have been excluded."))
    }
    
    # Subset expression matrix to only genes of interest
    gene_expr <- gene_expr[g_int, ]
    
    # Extract the Scores of the Signature of interest
    sig_score <- GetAssayData(object, assay = sig_assay, slot = sig_slot)
    sig_vec <- sig_score[sig_name, ]
    anno <- data.frame(score = sig_vec)
    # Make sure they are in the right order
    anno <- anno[colnames(gene_expr), , drop = FALSE]
    
    # Add any metadata if specified
    if (!is.null(grouping)) {
        meta <- object@meta.data[, grouping, drop = FALSE]
        anno <- cbind(anno, meta[rownames(anno), , drop = FALSE])
    }
    
    if (any(is.infinite(c(anno$score))))
        stop("There are scores with Inf values, please address this outside of this function. It could be because the slot used is scale_data.")
    
    # Make list of color to paint the annotation columns
    if (!is.null(grouping) & !is.null(grouping_color)) {
        score <- circlize::colorRamp2(
            breaks = c(min(anno$score), 0, max(anno$score)),
            colors = c("blue", "white", "red"))
        
        color_ls <- append(grouping_color, score)
        names(color_ls)[length(color_ls)] <- "score"
        
    } else {
        color_ls <- list(
            score = circlize::colorRamp2
            (breaks = c(min(anno$score), 0, max(anno$score)),
                      colors = c("blue", "white", "red")),
            RNA_snn_res.0.15 = clust_color)
    }
    
    # Set the order from most expressing to least expressing 
    ord <- rownames(anno[order(anno$score, decreasing = TRUE), ])
    # Add the score of the signature as annotation in the heatmap
    colAnn <- HeatmapAnnotation(
        df = anno[ord, , drop = FALSE],
        which = 'column',
        col = color_ls)
    
    # Visualize the Heatmap with the genes and signature 
    ht <- ComplexHeatmap::Heatmap(
        as.matrix(gene_expr[, ord]),
        name = "Gene Expression",
        col = expr_cols,
        cluster_rows = TRUE,
        cluster_columns = TRUE,
        column_title = sig_name,
        column_names_gp = gpar(fontsize = 14),
        show_column_names = FALSE,
        top_annotation = colAnn)
    
    # Return ComplexHeatmap
    draw(ht)
}

pal
         B cell, IgG+          B cell, IgG-          CD4, EM-like      CD4, non-EM-like          CD8, EM-like      CD8, non-EM-like                    DC               NK cell              Platelet                   RBC    classical Monocyte intermediate Monocyte nonclassical Monocyte 
            "#000000"             "#005555"             "#55859E"             "#FF91C8"             "#853CAA"             "#0C5ACE"             "#B66DFF"             "#79BCFF"             "#AA91A9"             "#922400"             "#C26000"             "#42E61E"             "#FFFF6D" 
# Visualize the heatmaps for all signatures
# tt <- lapply(unique(sig_df$signature), function(i) {
#     geneHM(
#         object = se,
#         sig_df = sig_df,
#         .source = "signature",
#         .target = "gene",
#         sig_name = i,
#         expr_slot = "data",
#         expr_assay = "RNA",
#         sig_assay = "pathwaysulm",
#         sig_slot = "data",
#         grouping = c("RNA_snn_res.0.15"),
#         grouping_color = list(RNA_snn_res.0.15 = clust_color))
# })

Here are some examples of how to interpret these gene signatures:

  1. Diving deeper into the EBF1 signature we can see how the cells with a high activity score have a high expression of CD79A/B which have a mode of regulation of 1. In turn, those with a negative score have a high expression of GATA3 whose mor is -1, thus indicating that the EBF1 pathway is turned off.
# Subset to 10% of the original dataset for speed and visualization purposes
se_sub <- se[, sample(colnames(se), 0.1 * ncol(se))]

geneHM(
    object = se_sub,
    sig_df = data.frame(net),
    .source = "source",
    .target = "target",
    sig_name = "EBF1",
    expr_slot = "data",
    expr_assay = "RNA",
    sig_assay = "tfulm",
    sig_slot = "data",
    grouping = c("Celltype"),
    grouping_color = list(Celltype = pal))

net %>% dplyr::filter(source == "EBF1" & target %in% c("CD79A", "CD79B", "GATA3"))
# A tibble: 3 × 3
  source target   mor
  <chr>  <chr>  <dbl>
1 EBF1   CD79A      1
2 EBF1   GATA3     -1
3 EBF1   CD79B      1

Briefly, when scoring TF activities and looking at their activities it is important to not only look at the overall score obtained for each cell but one also needs to dive deeper into which are the genes that are driving that signature!

Lastly we can also look at the TF activity per cell type by condition of interest as below:

disease_pal <- c("#41AE76", "#225EA8", "#E31A1C")
names(disease_pal) <- c("normal", "influenza", "COVID-19")
# make a donor palette
donor_pal <- c(
    "#66C2A4", "#41AE76", "#238B45", "#006D2C",
    "#41B6C4", "#1D91C0", "#225EA8", "#253494", "#081D58",
    "#FFEDA0", "#FED976", "#FEB24C", "#FD8D3C",
    "#FC4E2A", "#E31A1C", "#BD0026", "#800026")

names(donor_pal) <- c(
    "Normal 1", "Normal 2", "Normal 3", "Normal 4",
    "Flu 1", "Flu 2", "Flu 3", "Flu 4", "Flu 5",
    "nCoV 1", "nCoV 2", "nCoV 3_4", "nCoV 5",
    "nCoV 6", "nCoV 7_8", "nCoV 9_10", "nCoV 11"  
)

dd <- bind_cols(se@meta.data, t(as.matrix(se@assays$tfulm@counts))) %>%
        tidyr::pivot_longer(
            cols = rownames(se@assays$tfulm@counts),
            names_to = "TF",
            values_to = "score") %>%
        group_by(Celltype, disease, donor_id, TF) %>%
        summarise(median_score = median(score))

# Let's look at some TF of interest
lapply(c("IRF1", "IRF2", "IRF6", "IRF9"), function(i) {
    dd %>%
        filter(TF == i) %>%
        ggplot(aes(x = disease, y = median_score, fill = disease)) +
        geom_boxplot(outlier.shape = NA, alpha = 0.7) +
        geom_jitter(aes(color = donor_id), height = 0) +
        facet_wrap(~ Celltype) +
        scale_fill_manual(values = disease_pal) +
        scale_color_manual(values = donor_pal) +
        scale_x_discrete(limits = names(disease_pal)) +
        theme_classic() +
        labs(title = glue::glue("Factors by condition in {i}"))
})
[[1]]


[[2]]


[[3]]


[[4]]

Session Info

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggpmisc_0.5.4-1       ggpp_0.5.4            colorBlindness_0.1.9  ComplexHeatmap_2.16.0 decoupleR_2.9.1       lubridate_1.9.2       forcats_1.0.0         stringr_1.5.1         dplyr_1.1.4           purrr_1.0.2           readr_2.1.4           tidyr_1.3.1           tibble_3.2.1          ggplot2_3.5.0         tidyverse_2.0.0       Seurat_5.0.2          SeuratObject_5.0.1    sp_2.1-3             

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22       splines_4.3.1          later_1.3.2            cellranger_1.1.0       polyclip_1.10-6        confintr_1.0.2         rpart_4.1.19           fastDummies_1.7.3      lifecycle_1.0.4        doParallel_1.0.17      globals_0.16.2         lattice_0.21-8         vroom_1.6.3            MASS_7.3-60            backports_1.4.1        magrittr_2.0.3         Hmisc_5.1-0            plotly_4.10.4          rmarkdown_2.25         yaml_2.3.8             remotes_2.4.2.1        httpuv_1.6.14          sctransform_0.4.1      spam_2.10-0            spatstat.sparse_3.0-3  reticulate_1.35.0.9000 cowplot_1.1.3          pbapply_1.7-2          RColorBrewer_1.1-3     abind_1.4-5            rvest_1.0.3            Rtsne_0.17             BiocGenerics_0.46.0    nnet_7.3-19            rappdirs_0.3.3         circlize_0.4.15        IRanges_2.34.1         S4Vectors_0.38.1       ggrepel_0.9.5          irlba_2.3.5.1          listenv_0.9.1          spatstat.utils_3.0-4   MatrixModels_0.5-2     goftest_1.2-3          RSpectra_0.16-1        spatstat.random_3.2-2  fitdistrplus_1.1-11    parallelly_1.37.0      leiden_0.4.3.1         codetools_0.2-19       xml2_1.3.5            
 [52] tidyselect_1.2.0       shape_1.4.6            farver_2.1.1           base64enc_0.1-3        matrixStats_1.2.0      stats4_4.3.1           spatstat.explore_3.2-6 jsonlite_1.8.8         GetoptLong_1.0.5       Formula_1.2-5          ellipsis_0.3.2         progressr_0.14.0       ggridges_0.5.6         survival_3.5-7         iterators_1.0.14       foreach_1.5.2          progress_1.2.2         tools_4.3.1            ica_1.0-3              Rcpp_1.0.12            glue_1.7.0             gridExtra_2.3          xfun_0.42              withr_3.0.0            BiocManager_1.30.22    fastmap_1.1.1          fansi_1.0.6            SparseM_1.81           digest_0.6.34          timechange_0.2.0       R6_2.5.1               mime_0.12              gridGraphics_0.5-1     colorspace_2.1-0       Cairo_1.6-1            scattermore_1.2        tensor_1.5             spatstat.data_3.0-4    utf8_1.2.4             generics_0.1.3         data.table_1.15.0      prettyunits_1.1.1      httr_1.4.7             htmlwidgets_1.6.4      uwot_0.1.16            pkgconfig_2.0.3        gtable_0.3.4           lmtest_0.9-40          selectr_0.4-2          OmnipathR_3.8.0        htmltools_0.5.7       
[103] dotCall64_1.1-1        clue_0.3-64            scales_1.3.0           png_0.1-8              knitr_1.45             rstudioapi_0.15.0      tzdb_0.4.0             reshape2_1.4.4         rjson_0.2.21           curl_5.2.0             checkmate_2.2.0        nlme_3.1-163           zoo_1.8-12             GlobalOptions_0.1.2    KernSmooth_2.23-22     parallel_4.3.1         miniUI_0.1.1.1         foreign_0.8-84         logger_0.2.2           pillar_1.9.0           vctrs_0.6.5            RANN_2.6.1             promises_1.2.1         xtable_1.8-4           cluster_2.1.4          htmlTable_2.4.1        evaluate_0.23          magick_2.7.5           cli_3.6.2              compiler_4.3.1         rlang_1.1.3            crayon_1.5.2           future.apply_1.11.1    labeling_0.4.3         plyr_1.8.9             stringi_1.8.3          viridisLite_0.4.2      deldir_2.0-2           munsell_0.5.0          lazyeval_0.2.2         spatstat.geom_3.2-8    quantreg_5.97          Matrix_1.6-5           RcppHNSW_0.6.0         hms_1.1.3              patchwork_1.2.0        bit64_4.0.5            future_1.33.1          shiny_1.8.0            ROCR_1.0-11            igraph_2.0.2          
[154] bit_4.0.5              readxl_1.4.3           polynom_1.4-1         

References

Badia-i-Mompel, Pau, Jesús Vélez Santiago, Jana Braunger, Celina Geiss, Daniel Dimitrov, Sophia Müller-Dott, Petr Taus, et al. 2022. “decoupleR: Ensemble of Computational Methods to Infer Biological Activities from Omics Data.” Edited by Marieke Lydia Kuijjer. Bioinformatics Advances 2 (1). https://doi.org/10.1093/bioadv/vbac016.